UC Berkeley at CLEF 2003 - Russian Language Experiments and Domain-Specific Cross-Language Retrieval
نویسندگان
چکیده
As in the previous years, Berkeley’s group 1 experimented with the domain-specific CLEF collection GIRT as well as with Russian as query and document language. The GIRT collection was substantially extended this year and we were able to improve our retrieval results for the query languages German, English and Russian. For the GIRT retrieval experiments, we utilized our previous experiences by combining different translations, thesaurus matching, decompounding for German compounds and a blind feedback algorithm. We find that our thesaurus matching technique compares to conventional machine translation for Russian and German against English retrieval and outperforms machine translation for English to German retrieval. With the introduction in CLEF 2003 of a Russian document collection, we participated in the CLEF main task with monolingual and bilingual runs for the Russian collection. For bilingual retrieval our approaches were query translation (for German or English as topic languages) and ‘fast’ document translation (for English as the topic language). Document translation significantly underperformed query translation (using the PROMPT translation system).
منابع مشابه
Language-Dependent and Language-Independent Approaches to Cross-Lingual Text Retrieval
We investigates the effectiveness of language-dependent approaches to document retrieval, such as stemming and decompounding, and constrast them with language-independent approaches, such as character n-gramming. In order to reap the benefits of more than one type of approach, we also consider the effectiveness of the combination of both types of approaches. We focus on document retrieval in ni...
متن کاملDomain-Specific Russian Retrieval: A Baseline Approach
Berkeley group 2 chose to perform some very straightforward experiments in retrieval of Russian documents using queries derived from topics in all three languages. Thus we performed two runs with monolingual Russian retrieval and one cross-lingual run each with German topics and English topics. Query translation was done using the online PROMT translator (www.translate.ru). Monolingual results ...
متن کاملDomain-Specific Track CLEF 2005: Overview of Results and Approaches, Remarks on the Assessment Anaalysis
The domain-specific track aims at monoand cross-language information retrieval on structured scientific data. This track studies retrieval in a domain-specific context using two social science databases: The German Indexing and Retrieval Testdatabase (GIRT) (forth version GIRT-4: German/English pseudo-parallel corpus with identical documents) with 302,638 documents in total, and the Russian Soc...
متن کاملEvaluation of Cross-Language Information Retrieval Using the Domain-Specific GIRT Data as Parallel German-English Corpus
The development of the evaluation of domain-specific cross-language information retrieval (CLIR) is shown in the context of the Cross-Language Evaluation Forum (CLEF) campaigns from 2000 to 2003. The pre-conditions and the usable data and additionally available instruments are described. The main goals of this task of CLEF are to allow the evaluation of Cross-Language Information Retrieval (CLI...
متن کاملExperiments in Classification Clustering and Thesaurus Expansion for Domain Specific Cross-Language Retrieval
In this paper we will describe Berkeley’s approach to the Domain Specific (DS) track for CLEF 2007. This year we are using forms of the Entry Vocabulary Indexes and Thesaurus expansion approaches used by Berkeley in 2005[10]. Despite the basic similarity of approach, we are using quite different implementations with different characteristics. We are not, however, using the tools for de-compound...
متن کامل